Boyer-Moore Strategy to Efficient Approximate String Matching
نویسندگان
چکیده
We propose a simple but eecient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet 6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size !, that is, m(dlog 2 (k + 1)e + 1) !. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step. Notions of shift and character skip found in the Boyer-Moore (BM) 9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2 + k+4 m?k).
منابع مشابه
Approximate Boyer-Moore String Matching
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the pro...
متن کاملThe Filtering Approaches for the Improved Boyer-Moore Approximate String Matching
The Boyer-Moore algorithm is to solve exact string matching. Here, the Bad Character Rule of the Boyer-Moore algorithm is extended to solve approximate string matching. Although Tarhio and Ukkonen introduce a basic algorithm, it is similar to the Horsool algorithm. We utilize the concept of their algorithm to implement the Bad Character Rule, and we will obtain a new shift length. When the wind...
متن کاملString Matching in the DNA Alphabet
Searching for occurrences of string patterns is a common problem in many applications. Various good solutions have been presented for string matching. The most efficient solutions in practice are based on the Boyer–Moore algorithm.1 A typical question in molecular biology is whether a given sequence has appeared elsewhere. In the following, we will concentrate on searching for exact occurrences...
متن کاملAdapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
In this paper we propose an efficient approach to the compressed string matching problem on Huffman encoded texts, based on the Boyer-Moore strategy. Once a candidate valid shift has been located, a subsequent verification phase checks whether the shift is codeword aligned by taking advantage of the skeleton tree data structure. Our approach leads to algorithms that exhibit a sublinear behavior...
متن کاملOccurrence and Substring Heuristics for i-Matching
We consider a version of pattern matching useful in processing large musical data: matching, which consists in finding matches which are -approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols , is measured as . We also consider -matching, where is a bound on the total sum of the diff...
متن کامل